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ABSTRACT 

Guidelines for basic sampling and weighting 
procedures for Massachusetts' Management and Information System for 
Occupational Education (MISOE) are discussed, illustrating solutions 
relevant for systems development • Disproportionate, stratified random 
sampling within and across HISOE subsystems will be carried out, with 
a hierarchical stratification over specified control distensions* 
These dimensions will bes (1) sectors and subsectors, (2) educational 
programs, (3) regions where feasible, (4) locales and school types, 
and (5) students* As the second of two papers prepared to delineate 
MISOE's design, these planning considerations are detailed separately 
for the secondary school sector, the post-secondary programs, and 
various adult vocational education and manpower programs* Separate 
weighting procedures may be required for economic and noneconomic 
date , due to their distinct roles in MISOE* Although the 
multidimensional distribution of students demands stratification 
control, the program size limits the amount of control possible* 
Numerous recommendations are included, noting needs for further 
development of various aspects of MISOE* Cost effectiveness data are 
appended for followup purposes* Related documents are available as VT 
018 602, VT 018 606, VT 018 809, and VT 018 810 in this issue* 
(AG) 
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preface 

Thi- paper is one of a series prepared by the staff and 3 team of 
conso.'tants to delineate and document the design, of the Management and 
information System for Occupaiionoi Education, it is tiie second of two such 
papers by the author, submitted as fhe forma! response to staff Inquiries, 
and as major tangible products of the consultation rela1 ionship. Gratitude 
is expresed to the staff for its extensive help in providing pre! imi nary data 
and in conferences. The .resent paper discussed sampling and weighting issues 
and Illustrates solutions relevant to system development. The staff is en- 
couraged to make selective and flexible adaptation of the aids offered. 

Occasional Paper ;flO makes several references to Technical Memo- 
randum H2, which is included in fnis pubilcation as Appendix B. 
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SAMPLING AND WEIGHTING CONSIDERATIONS FOR MISOE 
John A. Creager 

Part I, Sampling and Weighting Requirements for MISOE 

MISOE requires collection of a high volume of data on a wide range of 
varlablesi over a hierarchically structured^ varied, and geographically dispersed 
set of observation units. Under such circumstances It Is neither loglstlcally 
nor economically feasible to obtain all data on the complete set of observation 
units. Some data Items can. Indeed must , be obtained on such a census basis. 
Occasional Paper No. 1 defines this need In terms of: 

1. providing a basis for selective trends data on enrollments, 
expenditures, and specific performance objectives, by program, 
locale^ and school, 

2. providing an annual description of same, and 

3. establishing a population base for sampling. 

The last implies the need for census information to control the sampling and to 
provide a basis for weighting sampling data to be representative of the popula- 
tions of concern. The weighting procedures are required to adjust the sampling 
data for the sampling ratios in random sampling, and for any biases due to non* 
random sampling. 

This paper presents the guidelines for basic sampling within and across 
MISOE subsystems, using disproportionate, stratified random sampling. By dis- 
proportionate is meant variable sampling ratios for the various cells of the 
sampling structure. Stratification will be hierarchical over specified control 
dimensions. Primary ob serration units will be programs ; secondarily, students 
who are eligible for and/or actually flow through the system are also observa- 
tion units so that sampling and weighting procedures must take this students- 
vithln-proyams concept into account. Similarly, account must be taken of the 
way in which programs are distributed among schools and locales. Except for Boston, 



with about 15 schools, locales and schools will be treated on a one-to-one 
basis; only minor logistic adjustment should be required in a few other 
cities having 2-6 schools. 

Illustrative sampling plans for the major sectors of occupational 
education will be offered in Parts II and m of this paper. These plans 
can only be regarded as tentative. to be modified by more complete informa- 
tion, when available, on the distribution of students among programs and 
locales. The missing information is unevenly distributed across regions (and 
possibly across programs), and in some cases is lacking from locales with 
high population densities. Every reasonable effort should be made to fulfill 
the staff expectations to complete this information in the census data system 
and to ascertain what adjustment, in the sampling plans are to be made to 
ensure proper selection and representation of programs within each sector. 
Presently.available information does not indicate the school distribution 
within the City of Boston (035) and this may be required in some programs 
given in more than one school. 

special smflUe .coa.lderatlons will be presented 1„ Part IV for 
conducting £ollo«up surveys to obtain so™ of th. post-tapact Infor^tlon. 
for mitlatlns the sy,te„ en a cross-.ectlonal basis, for dealing „lth cohort 
replacen^nts. and for obtaining control groups. Ihe weighting procedures 
required for eati«tlng population para^ete-rs rtll be dlacuased In tart V In- 
cl.-,d,.ng those for control of bias In the ...ongltudlnal files due to nonresponse ' 
to followups. 

It saaplmg and weighting procedures are properly handled In descriptive 
.pace as part of the data-entry sy,t«, no serious inferential problems should 
arise In high level descriptive or sleulatlve analysis fro. thl, source. There 
are, however, certain UMtatlons on the degree of control of sampling error 
«d their effects 1„ such a syst,» and these will be discussed. Occasional 



Paper No. 7 sugpesced hot.' weights caa be ported co daU rec£>y/:ls «nd used in 
cumulating operatioas for estinwti.'ig popuiaKion parameters. Sampling variance 
in such estimates will be discussed; the need for this is stated on page 10 of 
Occasional Paper No. 1, which states a goal in terms of confidence limits. 



Because there exists a great variation in progniin size and distribution 
of program offerings across sectors, school types, and locales, and because 
there are constraints on the costs and logistics of data collection, sampling 
and weighting considerations must £?.k^ cogni5-,ance of the three-stage information 
collection process specified in Technical Memorandum #2, July 10, 1972.* More- 
over, sampling and weighting procedures must be developed and implemented in 
a manner consistent with analysis requirements and considerations indicated 
in the previous occasional papers. It should be noted that the three stages 
of data collection are associated, respectively, with the Census Data System 
(CDS) and with the Sampling Data System in two parts, SDS (1) and SDS (2). 
The two parts of the SDS refer to data limited to input, cost, and impact for 
smaller programs in SDS (1) and full data, including those for process and 
product, for the larger programs in SDS (2). The data in the CDS are limited 
to costs, enrollments, completions, age, sex, and race. These are the varia- 
bles which are available as bases for weigliting sample data to population 
estimates. 

The next section discusses stratification and standard errors to the 
point necessary to specify options available in small programs, and to sug- 
gest flexibility in setting the cutting point that separates which programs 
are assigned to SDS (1) and SDS (2). 
'^en^ppendT^'BrpiT'eT.' 
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Stratification 

The basic principle to ensure unbiased estimates of population parameters 
is randomization . This requires that each observations unit vlthln a cell of 
the stratification design has an equal opportunity to appear In the sample. 
The sampling design will specify procedures for selecting schools and programs 
and it will be assumed that sufficient logistic control is available to pre- 
vent nonrandom losses of selected units. For a program wlthlri a school, it 
will be generally assumed that all students will be included in the sample 
except in a large school-program combination, where logistic arrangements must 
be made to ensure random sampling of the students therein. When data must be 
obtained from students, scheduling should be done in such a way as to avoid 
religious and ethnic holidays, football practice, or other special situations 
such that some proportion (more than 5%) of a special subgroup of stude^nts will 
be missing. Where this cannot be avoided, arrangements should be made for 
"makeup" data collection. Where some proportion of students within a large 
program given in a particular school are to be sampled, a selection by sections 
should be avoided if at all possible, unless there is good reason to believe 
that section assignments were random. 

Stratification is Introduced at the base of the sampling procedures 
for the following purposes: 

1. tc ensure some random sampling within and over all sectors and 
subsystem spaces of MISOE concern; 

2. to reduce the sampling error which would be obtained from simple 
random sampling over the entire system; 

3. to allow disproportionate sampling to Increase efficiency (using 
lower sampling ratios for homogeneous parts of the system; larger for hetero- 
geneous parts or for ensuring adequate sampling of important but low base rate 
subgroups of observation units), 

ERLC 
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The reduction in sampling errors In estimating population parameters 
(#2, above) is the classical reason given for stratification. Such reduction 
is a function of the correlation between the variables controlling the samp- 
ling design (dimensions of stratification) and the data to be obtained on the 
observation units. In large-scale programs this is a secondary consideration 
because no set of control variables is likely to be highly correlated across 
the highly diverse set of data variables involved. Moreover, purposes 1 and 
3, above, are more critical for such general programs. The control dimensions 
are therefore chosen with these purposes in mind. The result is that some 
reduction in sampling errors from that observed under simple random sampling 
may be observed, but this will generally be small, especially as compared with 
the reduction in sampling error (as normally computed for infinite populations) 
due to sampling finite populations. 

The control dimensions will be: (1) sectors and subsectors; (2) educational 
programs; (3) regions where feasible; (4) locales and school types; (5) students. 
In some cases, programs and where they are given may jointly define a strati- 
fication cell. There is too much variation in locales and enrollments among 
programs within program types , for the types to be a simplifying basis of 
stratification. It will, however, be convenient to discuss program sampling 
by their typological groupings (04, 07, 09, 14, 16, 17). Sampling considera- 
tions for each sector will be discussed separately in parts II and III of this 
paper. 

Classical sampling theory and its associated inferential statistics 
are essentially static in conceptualization, making no explicit reference to 
temporal changes in population parameters. In the absence of any dynamic 
sampling theory, longitudinal programs are forced to make certain conceptual 
and operating adjustments within the constraints of static inferential statis- 
tics. Actually classical theory does provide some flexibility in these matters, 

ERLC 



80 that the problem is not critical, but requires staff awareness for proper 
formulation and interpretation of any significance tests. For example, it is 
both feasible and valid to make such tests as the following: 

1. Given two samples of a population, the samples obtained at 
different times, set up the null hypothesis that the population parameters 
have not changed; if sample parameter differences are larger than expected 
under random sftraplin^j of a population without temporal change, reject the null 
hypothesis and conclude that a change has occurred in the population*. Use two- 
tailed tests » 

2. Given the situation in #1, set up the assymet.ric null hypothesis 
that no change in population parameter has occurred in a specified direction, 
and accept or reject this hypothesis on the basis of a one-tailed test. 

3. Given the situation In set up the null hypothesis that a 
change of specified amount and Ji.rection has not occurred; use the one-tailed 
test with the hypothesised difference subtracted from the observed difference 
in the numerator for computing 

Given a priori that some population changes are going to occur, in the 
longitudinal "observation of a single cohort, or in the replacement of cohorts, 
practical suggestions can be made to keep the analytical base under some control 
for Inferential error from sampling and weighting procedures. These are: 

1. For a single cohort passing through a process channel, refer 
the results of student analysis back to the initial input situation; 

2. When replacing a cohort at variable time points for different 
process channels, update the sampling and weighting parameters for that program 
so that analysis results will be representative of the population changes. 

The first of these is rather straightforward; the second requires some 
further comment. In MISOE it will not be necessary to adjust the definition 
of the population every time a cohort replacement occurs. The practice of 



keeping track of changes in the population counts of schools and programs 
in the CDS permits periodic account to be taken of the fact that new 
schools and new programs are started, some become defunct, some may even merge, 
«nd there may be administrative shifts in thie school-program joint arrangements. 
Such changes imply changes in the cell weights for new cobusrts. 

The important distinction between economic and noneconomic data and their 
roles in MISOE raises the question of whether separate sampling and weighting 
procedures are required. In the case of sampling, common procedures are 
necessary to ensure the match between economic and noneconomic data on a 
common base. Except for economic data expressed on a per student basis, sep- 
arate weighting procedures, based on economic census data rather than on enroll- 
ments may be required. This topic will be discussed further in Part V. Sampling 
plans based on the distribution of students among higher dimensions of strati- 
fication should be checked for within-cell homogeneity of economic census data. 

The major constr^. aing problem for program-oriented sampling in MISOE 
is the multifliodal, multi-skewed, multidimensijn>.i distribution of students 
•cross schools types and locales 'vhere programs are given, even when this Is 
considered within the major sectors of occupational education. This demands) 
stratification control, while program size limits the amount of control that 
Is loglstically feasible. This limitation is compounded by the fact that 
random sampling errors in descriptive statistics aggregated for programs, 
school types, or locales are inversely proportional to the square root of the 
number of observation units (students) on which the statistics are based. For 
example, the standard error of the correlation cc efficient when the population 
parameter is actually zero is l/f^, so that even with 225 cases, the stan- 
dard error is about .07, and sample correlations may fluctuate + .14 (approxi- 
mate 957. confidence limits), it takes about 1,000 cases to reduce the randcm 
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san5)ling error to .03. The Ijnplication is that correlational analysis across 
IPPI elements as discussed In Occas;«,onal Paper #7 will be rather unstable, 
except in the larger programs and probably should not be attempted where less 
than 300 cases are available in the sample. ( o ca ,58; C.L* ca +,12). 

The standard error of the mean is a tfinction of the standard deviation 
of the variable; however, reasonably stable means for continuous variables can 
usually be obtained with 30 cases. A categorical percentage, e.g., percent 
of students comple^:iag a ohase or ach'^ving a product standard of 50 (the 
parameter value where the sampling error is maximum) requires about 100 cases 
In the sample for the standard error to be even as small as 5% (i.e., 95% 
confidence limits would be about 40-60%). The standard error of a standard 
deviation is inversely proportional to \f2N*and is therefore less sensitive to 
sampling variations than is the mean. 

To be sure, the sampling errors for MISOE are reduced, markedly, by the 
fact that sampling is from a finite population (the reduction factor is 
^1 - N^/Np) and in the case of 100% sampling, reduces to zero. However, 
management decisions based on information from one cohort might be rj^sonably 
applied to later cohorts, so that there is a sense of sampling variance over 
time from a larger, less definite population. Technically, we are sarapllug 
from a population at a given point in time, but trying to generalize results 
to a population accumulated over a period of time. As MISOE obtains data on 
replacement cohorts. It will be possible to examine such temporal variations 
in sample data from programs which have not changed appreciably in Input or 
process spaces. 

The implications of these considerations for hISOE are that full analy- 
sis possibilities will be feasible only for larger programs and for suitably 
weighted aggregates across programs (e.g., when comparing OE and non-OE) . 
Technical Memorandum #2 defines the larger programs for SDS (2) as "approximately 
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800 or more*', and lists a few which are somewhat smaller. This paper considers 
sampling for programs within each sector so that a program manager has the 
option of analysis within sectors, which may differ considerably in locations, 
student types, costs, process details, and even objectives, or may consider 
program analysis over pooled sectors. This consideration and the fact that 
the sampling ratios can be varied, has led to consideration of sampling designs 
for programs within sectors. Combining this notion with the sampling error 
restriction leads to 300 students as an approximate cutting point on enrollment 
in a program within a sector for describing possible sampling and inclusion in 
SDS (2). Note that this merely provides more flexibility at high costs and 
logistic effort, but does not require implementation, in a couple of programs 
wrlth small but appreciable numbers in some sectors in addition to the major 
portion being in one of the sectors, it may be desirable to form one or more 
ad hoc cells to ensure representation of students in these sectors even though 
within sector analysis may not be possible. Thus, in the ensuing discussion, 
"small programs" are those with less than 300 students and within these programs 
MISOE has the following options: 

1. Obtain input-cos t-impact data on 95-100% samples (SDS-I). 
Analysis within the small programs will be limited to gross enttry level 
descriptive statistics, but can be aggregated with data from other programs 
for across-program analysis including correlational and simulatlonal types. 
This option is indicated for programs in the enrollment range of 30-300 students 

2. Consider pooling cohorts to buildup more adequate samples in 
small, stable, but costly (or otherwise "important") programs. This option 

i is relatively more attractive in the enrollment range of 100-250 and where 

the program length is short so that two cohorts will cumulate a sample close 
to 300 in reasonable timet 

This does not take into account losses due to nonresponse to foUowup 
Q surveys. See epilogue to Part III., pp, 25-26, 

ERIC 
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3. Consider pooling two, or more small programs within a general 
program type and which are highly similar. This option Is not generally 
attractive because of the program orientation and Interest m program variations 
in MISOE. Nevertheless. It should not be entirely ruled out where information 
iibout inputs, process, costs, and objectives indicate high similarity. Dichoto- 
mous variables Indicating programs from which particular student records come 
can be included m regression analysis. Similar logic applies to dealing with 
the -9900, OTHER "programs" which may be a mixture of quite similar, but not 
identical small programs. Admittedly, this option is not attractive and its 
use presumes not only desperation, but further information about program 
homogeneity. 

Another problem is that in those larger programs where sampling is 
feasible, the "same" program may have variations in inputs, processes, costs, 
and products at various locations where the program is given. The only way 
to ensure that all such variations are represented is to sample students within 
all locations within a program. This is not usually a logistlcally efficient 
procedure even where otherwise feasible. It should nevertheless be possible 
to ensure some representation of such variations i„ a program-oriented sampling 
procedure. Some will come into the sample by chance; alternatively, one can 
force them in, but this violates random sampling, unless an ad hoc cell is 
defined for them, with most of the programs, the size does not permit this 
Without giving up some of the other stratification dimensions for that program, 
in the case of picking up special variations by chance, tagging such variations 
m the sample will provide some basis for studying their efficacy. 

In some irstances a program given at, say, 3 locations has 80% of the 
students at two locations and 20% at the third. Depending on the sampling 
ratio desired, one may wish to take two of the three locations and must pick 
^ them Without randomization. Similarly, one may know of a particular program 
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the sample data system. Such violation of pure randomness may be tolerated, 
if not too frequent; the price is additional uncertainty in the estimates of 
population parameters, weighting factors, and sampling errors. 

It is likely that the initial sampling efforts during the development 
phase of-MISOE will provide valuable experience concerning costs of staff 
efforts in sampling and data collection, and that such experience will be use- 
ful in modifying detailed sampling procedures for cohort replacements or 
deciding the feasibility of moving some medium size programs from SDS-II to 
SDS-III, or vice verse. 

Part II considers the sampling issues for the Secondary School Sector 
of Occupational Education. Part Til does likev.«e for the other sectors, 
except for the presecondary sectc insisting of about 100 students in home 
economics. If this becomes of importance, the small program options, above, 
will be applicable. 

No attempt will be made in the illustrative sampling plans to control 
on student types (e.g., sex, race, etc.): however, differential weighting 
options will be offered in Part V. Cell definitions and sampling suggestions 
will frequently, refer to schools and school types, but it rhould be understood 
that this refers to the students within the schools sampled at 100% unless 
otherwise indicated, . 
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Part II. Sampling Considerations in the Secondary School Sector 

The secondary school sector is the largest in terms of programs, school ' 
types, and variations in locales over the six regions, with very different numbers 
of students at the locales. The sampling will be discussed, illustratively, 
with comments for those programs having over 300 students in the secondary 
sector. It will be convenient to discuss these programs in groups by general 
program type. Programs with a large concentration in Boston will be given a 
special stratification cell without marked reduction in the sampling ratio to 
ensure representation of special racial-ethnic or other low base-rate groups 
concentrated in the metropolitan area. 

The Agricultural Group-01 : with the exception of Agricultural Production 
010100, having 387. all agricultural programs are small, if the exception is 
admitted to SDS (2). take all secondary school cases in schools with at least 
10 students and ignore the rest, if you desire to take a more modest sample 
for SDS (1) only, form two cells, one with about M of the Boston students and 
the other with all students in two of the schools, m SDS (1) you may want to 
sample 010500 Ornamental Horticulture taking the students from two of the schools 
(use 95-100% in the other small programs) . 
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The Distribution Group-04 

The Apparel and Accessories program 040200 with 645 students can be 
^^led In one cell at 50%. If the present distribution holds up when the no 
information school data becomes available, the sample may be taken entirely £xow 
Boston, but it might be better to Include all from school 262. 

The Finance and Credit program 040400 has about 33a p«r«cclcally all in 
the Boston area. Take 85-100% for SDS (2) if desired, depending on the within- 
Boaton school distribution of this program. For SDS (1), a smaller sampling 
ratio may be taken. The Food Service program 040700 with 141 students must be 
in SDS (1), but it may be noted that all but 14 are in three schools, which 
may be jsaapled 95-100%. 

The General Merchandise program 040800 with about 1600 students is the 
largest in the group and provides opportunity for a well stratified sample of 
about 33% or 500 + students for analysis In SDS (2). Sample as follows: 

Cell 1 - Boston - take 33-100% of students depending on the 
school distribution 

Cell- 2 - take all students in three of the 10 schools of Region I 

Cell 3 - take two of the six schools in Region II 

Cell 4 - take two of the five schoosl in Region III 

Cell 5 - take one of the four schools in Region IV 

Cell 6 - school 332 in Region V and school 236 in Region VI are 

in the population, take 332 at least with 236 optional'. Sampling in this cell, . 

may have to be modified when Information becomes available for other schools 

in Regions V and VI. 

The Insurance program 041300 with 349 students, all in Boston schools, 
may be Included in SDS (2) with 80-100% sampling; for SDS (1) drop sampling 
ratio to about 50%^ 



The Health Group-07 

Subjects in this program group are generally found in the post- 
secondary sectors, xd.th only 425 indicated In the secondary sector, and 333 
of these are in the 079900 Other category (see option 3, page 10, however). 
The Home Economics Group-09 

The large Comparative Homemaking program 090100 with 4300-f students 
can be sampled in the 1/10 to 1/8 range to yield a sample for SDS (2) of 
400-500. Use 6 cells: Cell 1 — take 2 of the 12 non-regular schools (400-885); 
Cells 2 through 5 take 1 regular high school from* each of the Regions I-IV; 
Cell 6 — take a school from Regions V and VI. Possible modifications are 
splitting Cell 6 if enough additional subjects are available when the NI 
schools have reported, and the addition of a cell for the 381 at Lowell. The 
latter will increase the sample considerably unless a random sample of students 
(about 100) were taken vrithin this cell. Note, too, the special sex distribution 
in school 0^6, which could also form a special cell with random subsampling. 

Care and Guidance of Children 090201 with 734 students can be sampled 
at about in 2 cells: Cell 1 — take 3 of the 6 schools in Region I, and 
Cell 2 take 3 of the 7 schools in the other regions. Also select by toss 
of a coin either school 405 or 605 and add to Cell 2. 

Food Management 090203 with 345 students, 200 of whom are at Lowell, 
can be sampled 100% for SDS (1) or SDS (2), or reduced a little for logistic 
convenience by placing Lowell in one cell with 100% sampling and taking half 
of the remaining schools in a second cell. 
The Office Group-14 

Most of the programs in this Group are large enough and concentrated 
enough in the secondary sfictor to permit extensive sampling for inclusion in 
SDS (2). In Accounting attd Computing 140100 with 13K students, aim at about a 
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5-10% (650-1300 studentrt) sample. Take one or two schools in a cell: Cells 
1 — 6 for regular high schools in Regions I-IV, respectively, Cell 7 for Boston 
and Cell 8 for the Regional schools. 

In Business Data Processing 140200, aim at about 1/10 sample of the 
5K students. Use the same cell structure and sampling as in program 090100, 
except to add an ad hoc cell for Boston in which a logistically convenient 
sampling ratio may be used depending on the within-Boston school distribution 
for this program. About 100 subjects would be reasonable. Note that over- 
sar^>llng in the Boston area is favorable for ensuring representation of low base- 
rate urban groups. 

In Filing aiid Office Machines 140300 with I3K, sample as in 140100. 
it would appear that the same regular high schools could be used for these two 
program samples. However, you may wish to add a cell for the nonover lapping 
schools, and will probably need to independently resample in cell 8. 

Information Communications Occupations 140400 requires about a 1/5 
to 1/4 sample of its 2000 students. Take about that fraction of students 
from Boston in cell i and take all students in about five of the other secondary 
schools giving this program. 

Programs 140500 and 140600 are eligible for SDS (1); if you want to 
sample in 140600, take about half of the students in school 780 and all students 
in about half of the remaining schools. 

The Stenography and Secretarial Occupations program 140700 with 10%K 
students can be sampled in accordance with the scheme for program 140300, except 
that an ad hoc cell for Lowell should be added and the 535 students there sampled 
(take about 100). 

Program 140800, Supervisory and Administrative Management, though 
not included in the list in Technical Memorandum #2, has over 1600 students 
Q and probably should be sampled for inclusion in SDS (2). About a 1/4 sample may 
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be obtained in three cells: Cell I - take about 1/4 of those in Bostonj 
Cell 2 — take all of those in School 745; and Cell 3 - take all those in 
about 1/4 of the remaining schools. 

The largest program in the system is the Typing program 140900 with its 
25K students permitting the most elaborate sampling plan, aiming at a 2-47« sample 
(500-1000 students) for SDS (2), The sampling should be accomplished in U 
cells, the first nine of which consist of regular high schools and the remaining 
two of which consist of regional high schools. These cells and their sampling 



are: 



Cell 1 - take one school from, or a 1/3 sample within each school 
(better), at Lowell, Newburyport, and Peabody. 
Cell 2 - take 20% at Maiden 

Cell 3 - take 20% at Worchester and at Natick regular high schools 
Cell 4-9: take one or two regular high schools in each region 
Cell 10 - take two of the 14 regional high schools in Regions II and IV 
Cell 11 - take three of the 17 regional high schools in Regions III, 

V, and VI. 

The Technical Group- 16 

Students in the technical group programs are concentrated in the post- 
secondary sector. There are only 358 indicated in the secondary sector with 
160108 having the largest enrollment (130). Therefore these programs must be 
assigned to SDS (1) with 95-100% coverage. 
Th e Trades and Industr y Group- 17 

This moderately large group is very heterogeneous with many small 
programs which are obviously indicated for SDS (1). Several with sufficient 
size for SDS (2) are thinly spread over many schools and locales. Moreover, 
thexe i« generally more spread across the sectors than in the case of the other 
ErJc P^°8^«'»«5 therefore, we will take a special look at the cross-sector 
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pooling possibility for the development of a special analysis sample for the 
manager of a program which appears to be too small for SDS (2) consideration 
within sectors. 

The Air Conditioning program 170100 is a case in point* The 339 
students totalled across secondary and adult sectors would permit SDS (2) 
analysis treatment if all data were obtained on all subjects and if they were 
appropriately tagged by sector so that dichotomous variables for sector member- 
ship could be generated and used in correlational analysis. Of the programs 
considered in the previous groups and assigned to SDS (1), only programs 
040206 and 160108 would be candidates for this treatment. In Group 17, Blueprint 
reading 170500, Commercial Art 170700, Masonry 171004, Metallurgy 172400, and 
possibly Small Engine Repair 173100 are candidates in addition to the Air 
Conditioning program. 

Returning to the secondary sector sampling of Group 17 programs, Body 
and Fender 170301 with about 750 students requires 1/2 sampling. Take all 
students in the five regular high schools in Cell 1, the two self-contained 
schools (406 and 760) in Cell 2, and four* self-contained regional high schools 
(800 series) in Cell 3. 

The somewhat larger Automechanics program 170302 with about 2,500 
students is widely dispersed, so about a 1/5 sample in seven cells is indicated to 
cover the heterogeneity. The sampling suggested is: 

Cell 1 - trade schools - pick either 405 or 406 

Cell 2 - Region I schools regardless of type - pick two (exclude Boston) 

Cell 3 - Region II schools regardless of type - pick two 

Cell 4 - Region III schools - pick one 

Cell 5 - Region IV schools pick two 

Cell 6 - Regions V and VI • pick one school 

Cell 7 - Boston - 25-50% of the students (310). 
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Esaentially the same cell structure and selected schools may be used 
for Carpentry 171001 (N about 1,900), except that Cell 7 may be. deleted. 
The somewhat smaller Electricity program 171002 (N about 1,400) requires a 1/4- 
1/3 sample in three cells: Cell 1 take four of the 16 regular high schools; 
Cell 2 - take either trade school 405 or 406; and Cell 3 - take three of 
the 11 self-contained vocational high schools. 

Plumbing and Pipefitting 171007 has about 360 students making it a 
candidate for Sm (2) assignment if all or nearly all students are taken. If 
retained in SDS (I), take students in about half the. schools. 

The Drafting Occupations program 171300 with about 1,400 enrolled can 
be sampled at about 1/2-1/3 ratio in four cells: 

Cell I - take four of 18 regular high. schools 
Cell 2 - take one of the three trade schools 
Cell 3 - take one of the four regional hij^h schools 
Cell 4 - take three of the self-contained regional vocational schools. 
Electrical Occupations 171400 with about 700 students in the secondary 
sector, with a 50% sample, should take one half of the Boston students in Cell 
1 and the students in three of the six schools outside of Boston in Cell 2. 

According to Technical Memorandnn *o n^«4. • \ 

•emorandum Electronics Occupations 171500 may have 

two programs; these are not distinguished in the enrollment counts available 
to the author, A tentative sampling plan will be offered on the assumption 
that enrollments are concentrated in one of the programs, or that the sector 
divisions in the enrollment figures are clearly associated with the two programs. 

This, of course, must be checked, and if necessary, the sampling plan revised. 

The total secondary enrollment of about 1543 indicates a 1/4 sample, which can be 

obtained in three cells: 

Cell 1 - take five of the 20 regular high schools; 
Cell 2 - take one of the five self-contained regional schools 
E ?C ^^^^ ^ " '^"^ self-contained regional vocational schools.. 
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The Graphic Arts program 171900 with about 1,250 students should have 
a 1/3 sample in three cells; 

Cell 1 - take half of the Boston students 

Cell 2 - take five of the 20 regular high schools, possibly the 
same schools as in cell 1 of 171.^00 

Cell 3 - take three of the nine other schools. 
The Metalworking Occupations 172300 have three program indications in 
Technical Memorandum #2 (Machine Shop, Sheet Metal, and Welding). These are 
not distinguished in the enrollment distributions available to the author. 
Despite the very large total enrollment for which l/IO to 1/5 sattpling appears 
indicated, a cell sturcture would be quite moot until more is known about 
these programs a.id their distribution. They are probably sufficiently different 
in content that they must be kept separate for both sampling and analysis 
purposes « 

If cosmetology 172602 with 336 students is to be included in SDS (2), 
take all subjects or take all at Lowell for one cell and take two of the re- 
maining seven schools for the other. For SDS (1) take about half as many from 
Lowell and add one school. 

Quantity Food Occupations 172900, if a single program with 616 secondary 
enrollment, can be sampled by taking all 127 in Boston in Cell 1, two of the 
six other regular high schools in Cell 2 and two of the six self-contained 
regional vocational high schools in cell 3. 

It is not clear whether the enrollments available to the author for 
Woodworking Occupations 173600 are for a single prograrr (MiUwork and Cabinet- 
making) or for a total involving more than one program. On the assumption that 
they are for a single program, the 1,450 secondary enrollment indicates a 1/4- " 
1/3 sample for SDS (2) in four cells: 

Cell 1 - take half or all of Boston 

2 " four of 20 regular high schools 



20 

Cell 3 - take school 625 or 675 

Cell 4 - take school 853 or 872 • 
This essentially completes the recommendations for secondary cector 
sampling. It should be noted that the Boys Vocational Extension program has 
242 students all in Boston, and if homogeneous in content, may be considered 
for SDS (2). It is more likely to be in SDS (I) in any case, and definitely 
should be if heterogeneous. The Office, Other 149900 should be checked to see 
if subprograms may be associated with the high enrollment concentrations at 
Gardner and Pittsfield, which are high enough for them to be candidates for 
SDS (2). 



21 



Part III. Sampling Considerations in the Other Sectors 
The Postsecondarv 

The postaecondary programs for youths leaving or completing high school 
(this distinction might be an input characteristic) are for preparation to 
enter the labor market and are concentrated at the secondary schools (grade 13) of 
various types and at the 13 community colleges. 

There are no candidates in program types 01, 04, 09, 16, 17, from the 
Pustsecondary sector for inclusion in the SDS (2). Programs 040800, 070101, 
090201, 160106, 160108, 150109, and 161113, have larga enough enrollments 
that some sampling may be desired for SDS (1) , probably 50-100% bo that you h^ve 
100-200 subjects. 

Sampling plans will be presented for two programs in the Health Group 
07, end for four programs in the Office Group 14. The 625 students in the 
Nurse-Associate Degree program 070301 are in eight community colleges. Pick 
four of them for about a 50% sample. The 794 students in the Practical Nurse 
program 070302 can be sampled in three cells: Cell 1 - take 50-100% of the 
Boston students; Cell 2 - pick three of seven regular high schools; Cell 3 - 
pick three of seven remaining schools. 

In the Office Group-14: Accounting and Computing 140100 — 600 students 
are found in eight community college.. Take the- students in four or five of the 
colleges. 

Business Data Processing 140200 - 600 students. Take those in four 
of the eight community colleges as one cell, and all in the self-contained 
regional vocational high schools in the second cell. Alternatively, the sample 
in the second cell may consist only of those in school 806. 

Stenography and Secretarial Occupations 140700 - take those in three 
or four of the eight community colleges. 

Supervisory and Administrative Management 140800 - take those in two 
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It may be loglstically convenient to pick the community colleges which 
have these four programs in common with some violation of Independent random 
sampling principles, assuming more or less homogeneity of student input and 
wlthin-program processes across the. community colleges. Note, too, that no 
Information is available for some of the community colleges and the remaiaing 
colleges should be considered in the sampling. 
Th6 Adult Preparatory Sector 

Adult Preparatory Programs provide" part-time training to prepare for 
a new occupation. There are no candidates in program types 01, 04, 07, or 16 
from the Adult Preparatory sector for inclusion in the SDS (2). Programs 
140300, 171001, 171002, and 171500 have enrollments large enough that some 
sampling, probably in the 20-50% range, may be desired for SDS (1). This 
may also apply f:o the Other T & I 179900 with its 164 students concentrated in 
Boston. 

Sampling plans will be presented for oue program in Group 09, four pro- 
grams in Group 14. and thrse programs in Group 17. Comparative Homemaking 
090100 - Form two cells; take half of those students in Pittsfield for one 
cell, and all those in schools 073 and 150 for the other. 

Accounting and Computing 140100 - 295 students are scattered across 
some 13 regular high schools. Either take all for SDS (fi) or those m about 
half the schools for SDS (1). 

Business Data Processing 140200 - Take all or form two cells taking 
those in five of seven re/jular high schools and three of four self-contained 
regional vocational schools. 

Stenography and Secretarial Occupations 140700 — Take all of the 350 
students in 10 regular high schools. 

Typing Occupatlona 140900 - Take those in five hf 15 regular high schools. 
AutomechS;nic8 170303 - Take all except in schools with less than 10. 
Er|c M^talworking 172300 - Counts show 493 students in 23 schools, but com- 

— ments about three different programs, made for the Secon dar y sector -apply.-here.- 
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Quantity Food Occupations 172900 — Take a 60-75% sample of the students 
in school 821. 

The Adult Supplemental Sector 

The Adult Supplementary sector consists of part-time training for those 
already in the labor market, but needing to update or upgrade their skills. 

No supplemental programs in Groups 01, 04, 07, 09, or 16 are large 
enough for inclusion in the SDS (2). Programs 040800, 14900, 160605, 170100, 
and 171900 may require 50-75% sampling for SDS (1). All available subjects 
should be taken for SDS (2) in programs 140100, 140200, 140700, 140800, 170302, 
171001, 171400, 171500, 171900, 172801, and 172902, if these are included in 
SDS (2). In some cases, a particular school will have less than 10 subjects 
and can be omitted. If it is decided that these programs within the Supplemental 
sector are to be assigned to SDS (1), despite sufficient numbers of subjects 
for some correlational analysis, they may be sampled at a 25% sampling ratio. 

Sufficient numbers of subjects or special school distributions for 
more detailed sampling plans are found in programs 171002, 171007, and l?i:^00: 

Electricity 171002 - Take three of six regular high schools for Cell 1; 
take all in' school 405 for Cell 2; and take three of six schools in the "800" 
series for Cell 3. 

Plumbing and Pipefitting 171007 — Supplement the 200 in Boston (Cell 1) 
with those from half of the remaining schools (Cell 2). 

Metalworking Occupations 172300 — Although over 900 subjects are 
available, detailed sampling plans are deferred until the three program dis- 
tributions are available. 
The Apprentice Sector 

■ 

The Apprentice sector provides adult classes for those workers in the 
trades and industrial occupations under an apprenticeship training agreement, 
la the Apprentice Sector, the Masonry program 171004 should be sampled about 1/4 
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for SDS (1); take 1/4 of those in Boston and two of the seven remaining schools. 
All available subjects, except those in schools with less than 10 should be 
taken- for SDS (2) in programs 171002, 171500, 172400. Also for SDS (2): 

Carpentry 171001 - Take about 1/2, using two cells: regular high 
schools and schools in the "800" series. 

Plumbihg and Pipefitting 171007 - Take about a 1/2 sample, using three 

cells: 

Cell 1 - Boston 
Cell 2 - "800" schools 
Cell 3 - Remaining schools 
Electrical Occupations 171400 - Take about 3/4 of the students from 

Boston, 

Metalworking 172300 - Defer sampling until three programs distribution 
is obtained. 

The Manpower De velopment Training Act Sector (H.87-415) 

The .special problem for sampling in the MDTA Sector is that approximately 
3,500 students are distributed over some hundred program- locale combinations; 
"of the approximately 40 programs, about 25 are given at single locations and 
15 at 2-11 locations. With few exceptions, these programs must be 1007. 
sampled and included in SDS (I). To the extent that these programs involve a 
special federal-state relationship and the allocation of funds and accountability 
problems may be of special concern, MISOE may need to consider a third sampling 
data system in vhich SDS.(l) type of data is supplemented with some limited 
process-product information. MISOE may also wish to consider allowing some 
correlational analysis with higher sampling error risks, letting the program 
size cutoff within this sector drop to the 200-250 range (standard error of 
null correlations about .06). Programs given at single locations have 60 or 
less subjects and should be assigned to SDS (1) with 100% representation, as 
ErJc multiple location programs with 100 or less subjects. Programs 



25 



PRir 



with 100-250 subjects should be sampled In the 25-75% range for Inclusion in 
SDS (1) or" sampled 100% for inclusion in an SDS (3) if that option is chosen. 
For these smaller or borderline program decisions recall the small program 
options discussed earlier in this paper, especially the option of cohort 
pooling. 

What is left for consideration are the few MDTA programs where inclusion 
in SDS (2) is a possibility either directly or after consideration of certain 
conditions and options. The first of these is the program labeled Prevocational. 
Assuming that this has some functional homogeneity and is not a group of small 
but very different programs, form either one or two cells from Boston, depending 
on the importance of distinguishing Project 1063 from 0117001; the former could 
be sampled at about a 1/3 ratio. Cne or two additional cells may be formed 
on a similar basis for the Springfield students with 100% sampling, and a cell 
added containing all students at the other four locations. 

In the Nurses Aide program, take all available cases, except perhaps 
those in locations where there are only nine or 10 students. The Clerical 
Occupations program may be sampled in two cells: Cell 1 - take 1/3 to 1/2 
of the students m Boston and all of those at Fitchburg and Quincy. 

The Licensed Practical Nurse program occurs at two levels with 175 
regular students at diverse locations and 100 in an accelerated program given 
in Worchester. It is suggested that all available students be included with 
the accelerated students tagged for analytical purposes. The alternative is 
to leave this program m SDS (1) or SDS (3). A factor to be considered in 
such decisions is the possible importance of an allied health professional program. 
Epilogue 

The illustrative sampling plans presented in Part II for the Secondary 
sector and in the foregoing sections of Part III for the other sectors of 
occupational education are based on a cutting point of about 300 students enrolled 
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in a program within a sector for deciding whether the program belongs in SDS (1) 
or SDS (2): This cutting point is based on a consideration of what is possible 
given, the standard errors of correlational statistics within and between input, 
process, and product spaces, and the desire to provide for MISOE the maxiiaum 
possible long-range flexibility for service to management. Immediate imple- 
mentation of all of these plans is not necessarily indicated for initial 
development and implementation of MISOE as an operational system; they may be 
modified considerably, especially in those programs with sector enrollments in 
the 300-800 range. A major consideration in such modification is the fact that 
the response to foUowup surveys for impact data is certain to be less than 
100% of those originally sampled, even with 100% atteapt to contact. Con- 
nectability with impact data so obtained will be possible only for those who 
do respond to the followups (optimistically 80% in the one-year followup. 
60% in the three-year followup, 40% in the five-year followup, and 30% in the 
10-year followp). Whatever the response rate (and Indeed the anticipated 
rate for a 10-year followup is so low as to indicate that this one be abandoned), 
allowance must be made in the original sampling to ensure that the analyses 
of longitudinal data will be not rendered meaningless by being mostly the 
result of sampling errors. To accomodate this, many of the programs with 
sector enrollments in the 300-800 range may have to be kept in SDS (1) with 
the understanding that correlational analysis of cost-impact data with input 
control will be impossible unless, sampling ratios are increased.* For the re- 
maining programs, increasing the sampling ratios will be helpful, but will 
generally require a greater logistic load on the staff because more schools 
will be involved, and will increase the total costs of obtaining and processing 
longitudinal data. Followups are expensive. 
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are assigned to Sds (1). sampling recommendations even if the programs 
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Part IV. Special Sampling Considerations 

This part of Occasional Paper No. 12 discusses the sampling considera- 
tions related to initiating MISOE data systems on a cross-sectional basis, to 
replacing cohorts, to obtaining control group samples, and to obtaining followup 
data for 7.ongitudinal studies. 
Initiating MISOE Data Systems 

The first priority is to initiate the census data "system obtaining complete 
enrollment, completor, and cost data, age, sex, and race input counts for all 
programs at all locations, in order to establish the full basis for san^jling 
and weighting as well as the basis for management summary reports. Enrollment, 
input counts, ,and anticipated costs can be obtained for all ongoing cohorts, 
with actual costs and number of completors obtained as soon as possible on 
completion of each program. Using the latest available actual costs and numbers 
of completors from the latest cohort is not recommended because such actual 
costs are probably the basis for the present estimated costs and such completor 
counts are 'based on earlier enrollments. All pieces of the census data should be 
obtained for the same cohorts within programs to ensure connectability within 
the census data system and with the sample data systems. Anticipated and actual 
costs should be separately tagged. The data units in CDS are program- locale com- 
binations by sectors. The data units in the sample data, systems are individuals 
so that age, sex and race data must be included in the detailed, input data by. 
individual in the sample data systems. 

Detailed input data are required in both SDS (1) and SDS (2) for both 
cross-sectional and longitudinal purposes. In the input space the same data 
could serve both purposes to simplify logistics and to reduce the lead time 
for obtaining data fully connectable across the IPPI elements. It is important 
that input data be obtained as soon as possible at the beginning of- each program 
^ to avoid process contamination. For this reason, it is not recommended to obtain 
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"input" data on a current cohort 1/3 to 1/2 way through a three-year program. It 
would be better to wait until a new cohort is available, even if this means 
delay in MISOE becoming fully operational with the longer programs. Time is 
sometimes a friend and sometimes an enemy. 

The process descriptions and variables will be required 
for the programs in SDS (2). The observation units are programs within sectors 
with possible school-to-school variations. The student data records will con- 
tain process variables with the same values for all students going through a 
particular process. It is rather unlikely that the processes will change much 
from one cohort to the next, so chat initially process data on currently 
ongoing programs may be obtained for the cross-sectional initiation of MISOE 
and most of it may be updated with change information for longitudinal pur- 
poses. On this assumption the process data should be generally connectable to 
input data in SDS (2) for both cross-sectional and longitudinal purposes. 

In the product space, program completion-noncompletion for each sampled 
student should be available for both SDS (1) and SDS (2) on termination of 
current cohorts. For those behavioral objectives and phase completions which 
area matter of program records or a matter of paper-and-pencil achievement 
testing, product information may be retrieved for current cohorts for cross- 
sectional purposes in SDS (2) and gradually supplemented as these programs 
are continued through their final phases. Product data which requires direct 
staff observation will be possible on current cohorts only for those phases 
and- their objectives which have not already been missed, unless special 
observational sessions are set up for observing performances sometime after 
completion of the earlier phases. Once the longitudinal cohorts are set up 
and are being followed through, the timing and logistics of collecting product 
information can be' more uniformly determined and applied as a function of within 
program process schedules^ 
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Up to this point, there is reasonable connectability for cross-sectional 
data between input and process in SDS (2) and between process and product, but 
weak or uncertain connectability between input and product. There is also 
a great deal of extra staff and logistic effort to obtain the cross-sectional 
data some of which will either require time to znature or loss of connectability. 
The cross-sectional concept, presumably considered to advance the time by vrfiich 
MISOE might become at least partially operational, may not achieve this purpose, 
or do so at the expense of the analytic utility of the data. If only entry 
level analysis is contemplated with the cross-sectional data, thus not requiring 
connectability among IPPI elements, it may still be worthwhile, as Technical 
Memorandum #2 points out, for testing the system and providing considerable 
information. 

Impact data obtained on a cross-sectional basis will not be connectable 
with anything, except with retroactively retrievable data in the other elements 
from cohorts which have already passed through the educational system. This 
is a plausible and useful thing to attempt for cohorts that completed programs 
in recent years, assuming that school records and supplementary sources will 
provide viable names and addresses, in addition to the cost data, and any other 
data that are to be connectible with impact in either SDS (1) or SDS (2). The 
main value of obtaining data in impact space prior to the maturation of a Ion- 
gitudinal cohort lies in the development of the logistics for fcllowups and 
estimation of response rates. Both are somewhat dubious as indications of 
what may be encountered in the longitudinal followups.' 

Because the utility of cross-sectional information is so limited and 
presumably temporary, and analysis largely confined to entry level descriptive 
statistics, smaller samples can be used than for the longitudinal data collection. 
Fewer cells, fewer schools and lower ratios of sampling of students within 
i:rnn some program- locale-sec tor combinations would be ihvolved with higher weights. 
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Because the same students may not be sampled for" the various MISOE elements, 
separate weighting factors may have to be computed for each data type, to 
maintain some semblance of representativeness across the cross-sectional 
infusion of the two data systems. 
Cohort Replacement 

When a sector cohort has completed a program, it is to be replaced 
with the next cohort. Assuming that program schedules are approximately the 
same in one locale as in another, the replacement should pose no serious 
sampling and weighting problem. However, using the same cell structures and 
sampled locales, which should prove the simplest approach, the enrollments 
will likely change and, therefore, the weighting factors to be applied to the 
new cohort data will also change. To ensure continuous connectability between 
the census and sampling data systems, the census data ^should be continuously 
updated at cohort replacement time program-by-program within sectors. Note 
that the CDS. carries more than the basis fpr "timely summary reports"; it is 
also the basis for weighting the samples to be representative of the current 
status of the system of occupational education. 

Minor 'adjustments in the sampling may also be required if a program is 
given Intermittently at some sampled locale. The locale should be replaced 
with another (at random) if available; otherwise, the weight adjustment will 
have to be relied upon to cope with the sampling change. Actually this might 
Involve a bias, presumably small. 

Any major shifts :in the locales where a program is given or major changes 
in enrollment in a locale, by program expansion or contraction or by redistri- 
bution^ will require adjustments in cell definition and possibly in sampling 
ratios within cells^ 

A number of comments have been made about cohort replacement In Occasional 
Paper No. 7 and in the sampling parts of this paper. These should be reviewed 
and integrated in MISOE development and planning. 
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Sampling for Control Groups 

To be able to perfonri comparative analyses between occupational and 
non-occupational education, MISOE requires control group samples of students 
in academic and general programs in the Secondary and Post-secondary sectors. 
Input, impact, general educational development (pretest and posttest^ and 
program cost data will be obtained for comparison with the same data types in 
occupational education. The appropriate sampling procedures depend upon the 
type of comparative analysis anticipated. The OE analysis of a given program 
n.ay be compared with three different non-OE analysis types (six in the Secondary 
sector when the distinction between academic and general education is considered), 
One non-OE type of analysis for comparison is based on only those students in 
non-OE programs given in the same schools as those where a given OE program is 
offered. Another non-OE type of analysis for comparison is based on students 
attending schools in which OE programs of any kind are given. This vould use 
a cominon sample for non-OE analysis for comparison with OE analysis for any 
program, but would base the comparison on a representative sample of only that 
part of the state's non-OE system. The third base would be a common sample 
for comparison with any OE program, but would be representative of the state's 
entire non-OE system for that sector. 

These types of comparative bases correspond to somewhat different 
analytic hypotheses, and have different sampling and logistic implications. 
The first basis requires separate sampling for each program for which a com- 
parison is to be made and would represent a very large effort to obtain a fussy 
refinement, which may not represent the real concern of management. The 
distinction between the first two and the third bases represents the fact that 
non-OE programs are given at schools with and without OE programs. The 
sampling considerations for non-OE will be discussed on the third basis, i.e., 
that a common comparative basis including students at schools without OE 
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programs will be used to make analytical comparisons for each OE program 
analyzed. The non-OE analyses (one for academic in each sector, and one for 
general in the Secondaiy sector) should contain a dichotomous variable 
indicating whether the non-OE student wac in a school where occupational 
education was also given. 

In both the Secondary and Post- secondary sectors, enrollment counts on 
a census basis for non-OE students are required to obtain a basis for separate 
weighting of the non-OE data on the assumption that the students in academic 
and general programs are not distributed across schools in the same proportions 
as those in occupational programs, and to allow sampling of those non-OE 
students in schools without OE programs. The census counts should be obtained 
separately for those in academic and general programs, it may or may not be 
possible to use the same school level sampling for both. 

MISOE may use a completely independent cell structure for the non-OE 
sampling, a structure based entirely on the non-OE census counts and school - 
distribution. It may, however, be logisticaliy convenient, and it would enhance 
the comparability to sample non-OE students from the same schools in the same 
cell structure as those used in the OE sampling, but to supplement with additional 
region-defined cells consisting of schools where occupational educational 
programs are not given. This supplement will not be necessary if the second 
of the three comparative bases, discussed above, is chosen. It should be 
noted that if the third basis is used, and later, it is decided that analysis 
on the second basis is desired, this will be possible but the reverse would 
not be the case. 

Often in situations of this type smaller samples can be used for control 
groups. However, the sampling error constraints, including those implied by 
nonresponse to foUowup surveys to obtain impact data, indicate the choice of 
a sampling ratio such that 1000 academic students and another 1000 general 
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students be included In the samples. Even this will be barely adequate, if at 
all, to allow for nonresponse to the longer range followups. It is likely 
that the sampling ratios within cells can be smaller to build up the coomon 
sample across schools, especially in cells where the non-OE students constitute 
a larger proportion. 

In the Postsecondary sector, some of the occupational programs are 
given in the secondary schools as well as in the community colleges. Thus, 
the comr,-.arative basis should include some post- secondary students in secondary 
schoolo taking academic program, if such exist. If not, the comparison basis 
will be community college students in academic programs. In the Secondary 
sector, the comparative basis should include only secondary school students. 

Detailed sampling plans cannot be offered at this time, but should be 
derived as soon as the necessary census data are available, and decisions are 
made regarding the bases for comparative analyses,* and on the issue of 
independent vs. matched cell structures. 
Sampling Considerations for Followups 

Although some of the impact data may be obtained from public records, 
most such dkta are expected to be obtained by "contacting former students by 
mail with a survey questionnaire, at stated points in time following program 
completion. This will be required for those students included in SDS (2) 
and in the control groups, and at least for the larger sampled programs In 
SDS (1). Because of the anticipated problem with nonresponse to mailed 
questionnaires, all originally sampled students in programs for which detailed 
Impact data are desired should be included in the matlout group. It is important 
to Include dropouts and any transferring from one program to another. 
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* 

c . ^ Enrollment counts given in the Public School Directory are bv sex 
l^ of students, but not by occupational vs. academic vs. general pSgrlL. 
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It is important to obtain viable names and addresses from the students 
while "in Che pipeline" and to update theo on respondents to one foUowup for 
use on the next followup. Phone numbers may be helpful In following up hard 
core nonrespondents after sending out reminder postcards and using other 
techniques for reducing the nonresponse rates. Special additional weighting 
procedures will be required to adjust the longitudinal data for nonresponse 
bla8« 

The temptation to follow up random" subsamples of the originally sa.r.pled 
students should be resisted. This procedure is indicated in programs .;lth much 
larger input groups to reduce the follo,«ip costs, m MISOE, the reduced samples 
combined with the nonresponse problem will result in insufficient cases for 
stable analysis beyond the entry level. Therefore, the best strategy fee 
MISOE is to followup all originally sampled students and use techniques to 
minimize nonresponse and the bias associated therewith.. 
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Part .V. Weighting Procedures 



Data from samples in SDS (1) and SDS (2), ^Including data from the 
control group ^^aples must be weighted so that, in any aggregation and analy- 
sis, the population sampled or any defined subpopulation thereof will be 
reasonably representar?. Weighted results of aggregation and analysis will 
then be relevant to the populations or subpopulatioris of interest to manage- " 
ment. By computing and applying weights for each record at the student level, 
maximum flexibility and controls for bias are obtained. The initial setof 
weights correct for the varying sampling ratios among the various stratifica- 
tion cells for each program within each sector, and for any within-school 
random sampling. For noneconomic data, the weights are based on -a comparison 
of the numbers of students (the ultimate sampling unit) included in the sample 
with the census totals. The options for economic datz will be discussed in a 
later section. Weights are also required to control longitudinal data files 
for nonrandom losses due to nonresponse to followup surveys. The final weights 
to be applied to individual, sample records in a given aggregation or analysis 
are products of the directly computed weights. 

Occasional Paper No. 7 indicates how the weights may be posted to 
records and used in aggregation. The only additional problem on this point 
has to do with the non-integral nature of the weights and how the weights are 
to be placed on the. files and used in analysis. This matter will be discussed 
in a later section on software considerations in weighting. Occasional Paper 
No. 7 also considers the analysis implications of legitimate losses of some 
kinds of process and product data due to dropouts; a special section of this 
part of the present paper discusses the options for weighting adjustment for 
such "dropout effects" in analysis. 
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Baste Input Welghttng 

The first type of weight to be computed ar.d applied to all students 
records is the cell weight (Weight I). For any program within a sector, the 
wei^t muse be computed separately for each cell, In any program within a 
sector, for which there is a single sampling instruction to "take all subjects", 
i.e., 100% sampling within a cell, Weight-I will, of course, be 1.00, 
Any program within a sector for which no explicit sampling structure has been 
indicated should be regarded as having a single cell which may or may not have 
been 100% sampled. To compute the type I weight, first cumulate the census 
enrollments for that cell across all schools included in the cell definition . 
Next cumulate the total enroXlments in that program in those schools actLvnlly 
Included in the sample. Weight I is obtained by dividing the former by the 
latter. Note that it is necessary to have the census counts which are not 
yet available to be sure of the cell structure and sampling ratios to be used, 
and to include these counts in the numerator for computing Weight I; otherwise, 
MISOE will not be dealing with the full population, of student;: undergoing 
occupational education (or non-OE in the case of the control groups), but 
only that biased portion of the system that has readily responded with census 
data. 

Some additional conttol will be obtained if Weight I is computed 
separately by sex and/or race of the students, m many programs, there is a 
strong predominance of one sex or the other. The minority sex should be in- 
cluded in the san5»le, but differential weighting is less critical than in the 
case where 25-75% of the students are of one sex. 

If the differential option is not taken, there will be a weight of the 
first type for each cell within each program within each sector. This will be 
posted to the data records of all students sampled within the cell. If the 
ErJc differential option is taken, there will be twice as many weights in those 



37 



programs for which the option is taken, but only that x,eight for males will be ' 
posted to records for male students and analogously for female students, similar 
logic and operations are Involved If a differential weighting option is chosen 
for a white-nonwhite control. 

The second type of weight, which may also be computed differentially by 
sex and/or race, is the ratio of the number of students in a program (within 
sector), within a school to the number from that school with the same sector 
program.included in the sample, m most cases, this will be 1.00, since all 
subjects are usually taken within a school. However, in the larger enrollment 
programs which are given in the large schools, some random sampling within 
schools was indicated and the Weight II will allow for this. Note that in 
the .•Boston" cells, the distributions of students among and within schools Is 
not yet specified so that sampling within Boston which is not 100% will imply 
non-unit heights of type I if a subset of schools is chosen; non-unit weights 
Of type IX. if a sample of students within one or more large schools is taken, 
weight II will be computed for each sampled school and posted to the data records 
for all sampled students from that school. 

Weight III is the product of weights I and II by the appropriate sex 
and/or race c£ the student if the differential options are taken. It is 
uniquely computed for each student and posted to his data record. Weight m 
is unique for the student in terms of his sector, program, and school. Weight 
III Will normally be applied- to all aggregates of input data, it will either ' 
also be applied to data in other spaces or be a component of a product weight 
to be applied to other space aggregates and in interelement analyses. 
The Dropout Problem an d MISOE Options Thereon 

The nonavailability of some process and product data for dropouts 
ooses some special problems for MiSOE both in aggregations and in analysis, 
unfortunately, dropouts are not random losses and the problem Is more severe 
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in programs with high attrition and where, as is likely, the attrition occurs 
early in the program. The census data will include counts of completors but 
no indication of the process stage at which attrition occurs. This can be 
estiiaated by applying type m weights to dropouts in the samples, but not 
used as a basis for differential control of dropout bias in terms of points in 
time vhere losses occur. 

Actually, the application of type in weights to data on dropouts only, 
or to that on completors only would not give unreasonable estimates in aggre- 
gation and analysis of the subpopulations of dropouts and completors, any more 
than would be the case for any other nonrandom subpopulation. The statanents 
on page 46 of Occasional Paper No. 7 about analytical options with respect to 
these subgrot»s and their combination and on whether or not data are available 
still hold. MISOE has an alternative, refining option, and that is to recom- 
pute Weights I, II, and m (as IV, V, VI, respectively) differentially by 
dropout status with weight VI being applied to dara. This would have to be 
done at program completion time vAen that status is known on a census basis 
and for all sampled students by program within sector and by stratification 
cells and schools.- Alternatively, the differentiation can be confined to 
Weight l^r as the recomputed Weight I (as differentiated or not by sex and/or 
race), eliminating the need for dropout status differentiation of Weight II and 
separate accounting for within school dropout variation. With this alterna- 
tive. Weight I is adjusted to become Weight IV and Weight V is the adjusted- 
Weight III, the product of Weights n and IV, to be applied to process and 
product data (input, too, in correlational analysis). This alternative is 
recommended if the refining option is taken at all, an option which is some . 
additional trouble, but which introduces some partial control for dropout 
variations in the data across programs and locales. 
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Another alternative is an elaborate and expensive adjustment which " 
adapts the logic and operations to be described below for controlling nonres- 
ponse bias in the longitudinal data including Impact data from the followups. 
This poaaibility is not recon^ended at this time, but can be Investigated more 
fully, if necessary. 
Weighting of Cost Data 

The importance of the economic data and their analytic role in MISOE 
require special attention to developing options for sampling, weighting, and 
Integration of economic with noneconomic data in analysis. The discussion of 
such options makes the following assumptions: 

1. Analogous to the enrollment distributions for the system, the 
census data system will include anticipated and actual total program costs for 
each program within each sector broken out by individual schools. 

2. Derlva?:ive cost information will be available on the census basis 
from allocations of total costs to certain economically defined categories 
(capital costs, instructional, physical plant maintenance, etc.). 

3. More detailed process costs will be available in SDS (2) for sampled 
programs within sectors by locales, and broken out by Instructional phases 
(blocks, units, etc.) and program objectives. 

4. The availability of enrollment data on a census basis permits con- 
version of any costis, or their allocations, . to a per student basis when and 
if desired, within any breakout, in any data system. 

5. Economic data such as family Income (in input space) or earnings 

of graduates (1„ impact space) are obtained- from and about individual subjects, 
and may be treated In sampiing. weighting, and analysis in the same manner as 
other "student characteristics" data. 
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Generally, from the standpoint of interconnectability of economic and 
noneconoolq data, and for logistic convenience as well, sampling for economic 
data should follow the same sampling plan and actually the same samples of 
schools within programs within sectors as that for the noneconomic data, as 
described in Parts II and III of this paper. In sampled schools, where less 
than 100% of the students are sampled and where conversion to a per student 
basis is made, divide coets by the total number enrolled in the program (or 
actually passing through a costed process phase of the program), not the 
number of students within a school actually sampled for that program. In the 
event that the census economic data (anticipated costs) show some especially 
high or low cost (per student) for a program given at some particular locale, 
msOE may well want to ensure the inclusion of such a program-locale 
combination in the sample. It might be assumed that total costs are highly 
' correlated with enrollments across schools in which they are given. This can 

and should be checked. If the correlation is low, MISOE may also want to 
ensure representation of program-locale combinations with extreme total costs. 
Moreover, some schools may have extreme patterns 6f allocation of their total 
costs. The census data should be inspected from this viewpoint to ascertain 
the need for modifying the sampling plan, probably by adding a cell here or 
there across the system; obtain both economic and noneconomic data in all 
cells. Including added ones. Where adding cells is not possible, some trade- 
off between the cell structure recommended earlier and the modifications 
Indicated by these considerations will be required. 

Weighting recommendations must take into account the ways in which 
economic data at the program-school level and noneconomic data available at 
the student level are to be used in analysis. For entry level descriptive 
aggregation (and where such aggregates will be used in linear programming or 
^ dynamic simulation analysis), weighting of sampled process cost data may 
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consist simply of the Type I cell weights, computed on the basis of the most 
similarly available cost data in the census system, instead of on the enroll- 
ment basis. In place of the options for computing differential weights by 
sex, race, or completion, the differentiation options here are the allocations, 
insofar as different kinds of "sampled process costs are to be similarly 
allocated. Because of the uncertainties involved in the census allocations, 
MISOE may not find such differential weighting worth the effort, but it is as 
theoretically useful in control of sampling variations as the analogous options 
for the weighting of noneconomic data. Because the economic data are at the 
program level and the per student basis is computed from actual enrollments 
after applying the weight to the economic data, less than 100% sampling of 
students within schools is not relevant and therefore neither the Type II 
weight nor the Type III need be computed. 

In computing the economic weight (it should have another type desig- 
nation even though it is analogous to Type I), use the actual rather than the 
anticipated costs, even though there will be some delay involved in collecting 
the actual information. During the development and debugging phase of MISOE, 
however, weights may be computed using anticipated costs. 

The weighted aggregates of economic and noneconomic data may be cross- 
tabulated,at the same aggregation level, with costs on total, allocated, or 
per student bases. It is rather unlikely that the economic and noneconomic 
data will be used in the same regression analysis. Should this be required 
using schools, programs, or other aggregates above the individual level as the 
unit of analysis, it should only be necessary to ensure that aggregates at the 
same analysis unit level are combined in the data record, whether total, 
allocated, or per student cost data are used. When the student is the unit of 
analysis, the per student cost computed from cell-weighted process costs 
(total or allocated) and actual enrollment may be posted to the student 
record along with the data on the process to which that student was exposed. 
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The entire student record will, of course, be weighted in terms of the enrollment 
based Type III (or alternatives) weight discussed earlier. Note that this 
assumes that costs per student are constant for students taking the same 
program at the same school, an assumption that is being made anyway when com- 
puting and using per student figures. 

The sampling and wei^ting considerations for economic data need to be 
carefully reviewed during the MISOE development period as more census informa- 
tion becomes available and as the analysis systems are developed. 
Weighting for Nonresponse Bias in Followups 

Following the collection of impact data through followup procedures, 
longitudinal files for analysis across elements can be developed using the 
respondents to the followup. That portion of the impact data 
obtained by mail and/or survey of former students, will, of course, be missing 
for nonrespondents, introducing a bias. The approach to developing final 
subject weights for such longitudinal respondent files Is to form the product 
of a weight adjusting the file for the nonresponse bias and the Type III 
wel^t (or its alternative). The nonresponse-adjustlng weight. Weight F, may 
Itself be the product of some interim weights, and takes the respondent data 
back to representation of the followup group. If all originally sampled 
students are followed up, the Type III weight takes the followup group back to 
represent the original student cohort population . If only the completors are 
followed up, contrary to recousnendations. Type 11 and Type III weights would 
require adjustment (note that dropouts constitute a relevant control group in 
impact space and /.'*ould be followed up). 

Separate followup weights, and therefore final respondent weights, 
would have to be computed for each followup at one, three, five, and possibly 
10 years. Special adjustments will be required if only the respondents to 
one followup are Included in attempts to contact on a Inter followup. Decisions 
about some of these logistic matters on later followups will depend, in part, 
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on the response rates observed in the earlier ones. 

The. foUowup weight, and therefore the final subject weight, may be 
the product of Interim weights designed to cope with differential response to 
multiple waves of intensive efforts to Increase the response rate. Detailed 
recommendations for such elaborations of the followup weighting depend upon 
the details of followup logistics and the response rates for each wave. It 
is sometimes possible to treat the respondents to multiple waves as a single 
group for weighting purposes. To facilitate discussion of the development of 
the followup weight, it will be assumed that this is the case. Sporadic in- 
formation is available about the cost of various mailout techniques; the 
results of one such study are presented in Appendix A of this paper. It 
should be noted that these are only mailout costs and do not include costs 
for processing data on respondents, or for developing the weights. 

A typical logistic strategy is to follow the initial mallout, using 
first-class mall and live stamps, with a reminder postcard about a week later. 
Depending on the response rate to this first wave, a second wave mallout of 
the survey Instrument may be made to the nonrespondents. A third wave using 
special delivery is sometimes useful. Experience with phone contacts of hard 
core nonrespondents indicates than this is a very expensive and not very pro- 
ductive technique. With a smaller group, possibly concentrated in a smaller 
region, it may be useful as a reminder device, but not for obtaining data. 
The Pentagon locator file may be helpful in ascertaining locations of those 
who have gone into military service. 

■ Essentially two kinds of weighting procedures for adjusting data for 
nonresponse bias are available. One, the actuarial stratification method, is 
the classical one; the other is the Inverse response probability method, 
developed in the Cooperative Institutional Research Program of the American 
Council on Education. The two methods will be described with their relative 
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advantages and disadvantages. An empirical study comparing the ability of the 
two methods in reducing bias indicates the superiority of the second method. 

In the actuarial stratification procedure, sampling units are weighted 
in terms of variables purporting to measure and control for bias by defining 
a stratification based on those variables, and then computing weights based on 
the ratio of sampling units in the population in each cell to those in the 
sample. Variations in the method arise by differences in the choice of varia- 
bles and levels, and their cross-tabulatibn or nesting in the stratification 
design. Once these decisions are made, the remaining weighting operations 
and bias estimation are usually straightforward. The method assumes within- 
cell homogeneity in the population and random sampling within cells. Such 
methods were recommended in Parts II and III for weighting for the original 
sampling. ' 

Application of the stratification approach to the followup weighting 
requires a new post hoc stratification and involves: 

1. Identification of variables related to the non-random bias 
resulting from differential probabilities of response to mailed questionnaires. 
This may be accomplished by capitalizing or^^il^p^^ information and multicollinearity 
among varlaMes^u^^ In prediction of nonresponse solely 

for identifying the key variables. Experience indicates that sex, race, ability, 
and aspirations are more frequently required control variables. 

2. Ascertaining levels on these variables defining the stratifica- 
tion cell structure. This may be accomplished by examining cell counts re- 
sulting from various choices of cutting points on the variables and ensuring 
sufficient counts to provide stable weights with as many control levels as 
possible. Intuitive judgement is required to ascertain the final definition 
of the cell structure. By this method, all respondents within a cell receive 
the same weight and their data represent not only their responses, but also 

ERJC presumed responses of nonrespondents who fall into the same cell, 

HfflBMB _ *|feig. Study by As.tln and. Molm .ls. as. yet, unpublished. The Tesults::wl-l-l -be:_ 
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3. Computation of the cell weight as the number of subjects within 
the cell followed up divided by the number of respondents in that cell. 

It should be noted that, whether this method or the inverse probability 
of respons^e method is chosen, wei^ts must be computed separately for each 
program within sector. AXso, in both methods, the data for each respondent,, 
xrfien weighted, represent an estimation of the data for nonrespondents who are 
similar on the control variables. The regression analysis identifying these 
variables should use input data (student characteristics) and program comple- 
tion status as predictors of the dichotoraous criterion; subject responded or 
not. Product data may add to the prediction. 

The other approach to weighting for the nonrandom nonresponse bias is 
one in which each respondent receives a weight based on the inverse probability 
of his being a respondent, given his profile on the control variables. The 
data from that respondent then represents not only his response, but also the 
presumed response of anj^ nonrespondent with' the same profile of input character- 
istics used as control variables. Stepwise regression is used not only to 
identify the control variables, but also to develop the equation for predicting 
the response probabilities. The variable-identification function of regression 
in both methods is especially useful where pretest information is available 

the nonrandom sources of bias in the followup data are unknown. The 
advantages of the inverse-p approach to weigliting include having variable 
weights for all subjects rather than constant weights for groups of subjects 
in a given stratification cell; moreover, no assumption is required about 
homogeneity and random sampling within cells. The disadvantage is the diffi- 
culty in providing formal estimates of bias and variance in paraiTieters com- 
puted from the weighted distributions. 

One further issue is that possible curvilinear and interaction effects 
beyond the main effects discussed so far may be involved. In the actuarial 
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stratification approach, such effects appear, and are controlled for, by 
variations in weights among the cells, reflecting cell variations in response 
rates. Both the appearance and degree of control Is only as fine as the cell 
r.tructure permits. In the inverse-p procedure. It is necessary to hypothesize' 
which variables may be involved In such effects and include generated vectors 
representing them in the regression system. 

The adjusting weight for each respondent in the inverse probability 
method is essentially one divided by the predicted response probability. It 
Is possible for an occasional predicted p-value to lie outside the range of 
0-1. Such values are set by the program to the theoretical extremes to prevent 
weights being less than one. Normalizing adjustments and other refinements 
are available, some of which may not be feasible with the sizes of respondent 
samples anticipated in MISOE. Oae of the refinements is a ceiling placed on 
the weights, to protect against undetected errors. 

Experience with the stepwise multiple regressions for predicting response 
shows, rather consistently, a leveling off at a multiple R of about .20, and 
rather consistently, a selection of sek, race, and some measure of ability 
and/or achievement as the key variables. The rather low'multiple R could be 
the result of nonresponse being related to factors not measured in the input 
space, but this is rather unlikely givea a large number and different kinds 
of measures available and allowed to enter freely Into the regressions. More 
likely, much nonresponse can be considered as random effects once the demo- 
graphic and ability factors have accounted for the nonrandom effects. It may 
turn out that weighting followup samples within programs within sectors has 
taken care of much of the nonresponse bias. Some stoplifylng options may be- 
come available in MISOE after some experience is obtained weighting the first 
followup s. 
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Software Considerations In Weighting 

Program GENWTS, available from the author, will be quite useful In 
developing the weights for the original sample and for weighting followup 
data by the actuarial method. Provision Is offered for differential weighting 
for two subgroups, and the program can be readily modified to handle either 
more differentiations or the weighting of cost data. 

The regression package, assumed to be available for general analysis 
p'^rrposes in MISOE, will be useful in identifying control variables for weighting 
followup data by either method. Software for implementing the inverse probability 
method for followup weighting can be made available, but will probably require 
some adaptation for applications in MISOE. Separate programs are required 
for computing the weights and for integrating them with the other weights for 
the final longitudinal analysis files. 

Weights of all kinds are normally computed in floating point with 
care taken not to lose high order digits. Weights should be carried to two 
decimal places, multiplied by 100 and integerized for reading onto the analysis 
files. When the weights are read back into the computer for analysis using 
floating point, the F-conversion can be used to return the weights to their 
original form. In programs using integer conversion only (e.g., some cross- 
tabulation and head-counting programs) the integer form may be read in and 
final counts divided by 100. 
Estimation of Parameters and Sampling Errors 

Occasional Paper No. 7 (p. 18) presents the general basic equations for 
deriving aggregated counts, sums, sums of squared data elements and cross 
products, means, and elements of computing formulas for variances and covari- 
ances. These general formulas become specific as the indices of summation 
of student record data are specified to define an aggregate of analytical 
interest. With the weighting schemes suggested in this paper, the aggregates 
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may cut across the stratification structures and apply to any grouping of 
individual records of interest. The parameters so estimated for the various 
populations and subpopulanions in MISOE will not, in general, be unbiased, 
i.eo, an average of estimates on replications of the system will not necessarily 
be exactly equal to the "true" population parameters. Moreover, if such 
replications were possible, the estimates would vary (sampling error). The 
recoiranendations given in this paper are designed to reduce the risks of both 
bias and Variance, but it should be recognized that neither can be completely 
eliminated. It is sensible for MISOE to use census data wlierever it is 
available in preference to weighted sample data. 

It is very difficult, if indeed possible, to provide precise estimates 
of sampling fluctuations and of bias in estimated parameters in a system of 
this kind. In fact, no attempt will be made to do so, but the following 
comments will given some basis for making subjective estimates. 

Classical formulas for computing sampling errors for simple random 

samples from an infinite population provide a very rough idea of maxlinuia random 

sampling fluctuations. These are generally inversely related to the square 

root of the sample size, using the acCual, not the weighted sample N's. 

Random sampling error, so computed, is reduced in two ways. First, sampling 

Is from finite, rather than from infinite populations. The reduction factor 

In the sampling error for finite sampling amounts to \/l - N /N , and in MISOE 

s p 

will probably be a much more Important source of reducing sampling errors than 
the second way, stratification. 

The calculation of reduction in sampling error due to stratification Is 
considerably more formidable and somewhat variable, depending on the type of 
estimator involved. The correction it a function of the among-cells variation 
about the estimator, which is related to the covariation between the variable 
(item), the distribution parameter of which is being estimated, and the variables 
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defining the cell structure. In a general purpose sampling situation with 
many kinds x,f data involved and an a priori need to represent certain parts of 
the total system, as in MISOE, the stratification probably has small and 
variable effects on reduction of sampling errors, it does introduce some 
control by constraining sampling fluctuations -against unlucky wild variations ' 
that could occur by chance under pare random sampling. 

For the few items common between census and sample data systems (e.g., 
sex and rcce) obtained in the one case from school records as counts and in the 
other, from students completing input protocols, a comparison of weighted 
aggregate counts and proportions with those m the census -will given some 
check on the weighting procedures and some idea of the overall efficacy of 
the sampling and weighting operations. 

Bias in the estimators is a more serious mat ur in its Influence in 
analysis and as a possible source of Inferential errors. Biases tend to be 
in mdcnown directions and amounts. The recommended strategy is to use logis- 
tics of data collection which minimize bias and weighting procedures which 
identify and correct for detectable bias. 

Part VI. Epilogue 

This paper has discussed numerous sampling and weighting issues and 
options for MISOE with illustrative or suggestive recommendations. There 
remains considerable need for follow through during the development phase of 
MISOE to ensure sound choices among the options offered, more delineation of 
the integration of economic and noneconomic aspects of the system, finalization 
of sampling and followup logistics , and Integration of tliese matters with the 
development of the data and analysis systems. 
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The- table. In thl. appendix present "eo.t-.£fectlvene.." d.t. oh various 
-ll-out and follo^p technl,ues which .ere used In a fall 1971 foUownp of the 
1966 cohort of freshmen In the ACE Cooperative Institutional Research Program. 

There were approximately 60,000 former student. In the 1966 ,»,llont cohort. 
Fro. the total group, 14 random .aeple. of 1000 „er. cho.en a. "^parl^ental" 
group., student, without ZIP code. ..re then' deleted fro. these group. re.„ltl„g 
in slightly varying 8aii?>le sizes. 

Bcperimental "treatments" included in the following: 
A. Outgoin}^ Postap [e 

1. First-clasa live stamps (@16<:) 

2. Non-profit rate, printed permit ((31. 7c) 

3. Non-profit rate, pre-canceled stamps ((§2^) 

4. Non-profit rate, metered postage (@2<:) 
B« Outgoing Envelope 

1. Window 

2. Non-window, requiring matched insertion of the questionnaire 
C. Return Po8ta|^ e 

1. Live 8taiiq)s ((ai6c) 
2* Business reply ((ai8c) 
Postal Card Reminder 
1* Received 
2m Did not receive 
^* Second Wave Q uestlonnfl^yA 
1* Received 
2. Did. not receive 

*"o.Tvped Letter Tn.e.^,. .... 

1. Received 
2# Did not receive 
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The assignment of treatments, the costs associated with those treatments, 
and the percentage of response to the various techniques are outlined m Part 
I of the enclosed tables. The /ilrst wave of questionnaires was seat during 
the first week of November; the reminder postal card was mailed four or five 
days later; and the second wave of questionnaires went to ell non-respondents 
(whose initial questionnaires had not been returned as non-deliverable) during 
the last week of December. 

Since approximately 15% of non-profit rate outgoing questionnaires were 
returned as non-deliverable (as compared to 8 or 9% of those sent out first-class). 
It was decided to remail questionnaires with first-class postage to se whose 
original questionnaires were sent out at the non-profit rate but were returned 
as non-deliverable. The "Part 2" table combines the Part 1 data with the non- 
deliverable remail outcomes. 

One or two other analyses remain to be done (e.g., half of the second 
wave questionnaires were sent non-profit and half with first-class postage), but 
on the basis of the enclosed data, a foUowup of the 1968 cohort of freshmen 
which is going out later this summer will probably use the following approach: 
non-profit postage on first wave; window envelope; business reply return; a 
second wave questionnaire (with a printed form letter inserted) sent to con- 
respondents; a postal card reminder to second wave questionnaire non-respondents; 
and a first-class remail to non-delivered first wave questionnaires. 

For the City University of New York project, we sent questionnaires (with ' 
non-profit postage and live stamp returns) in mid-September (1971) to a random 
sample of 2984 students who had enrolled (as freshmen) at one of 14 CUN^ campuses 
in the fall of 1970. A week later, we sent ^all of them a reminder postal card 
and m «ld-Oetober a second wave of questionnaires t^nt out to all non-respondents. 
A month later (November 12). a short-form (postal card) questionnaire was sent 
special delivery with a personalized auto-typed letter enclosed to 1560 non-respondents. 
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At the end of November, the names and fall 1970 home addresses of the 
860 students who had not responded to either the full or postal card questionnaire 
were sent to a New York City survey research firm. The firm's interviewers 
attempted to reach by telephone each student or someone who could provide 
information about the student. For the final phase of the data collection 
process, we sent the names and addresses of a random sample of 100 students who 
had not returned a questionnaf.re and could not be reached on the telephone. 
Interviewers went to the students' fall. 1970 addresses and attempted to talk with 
the student or a member of his family. The percentage of response to the various 
techniques is summarized below: 

Full Questionnaire Postal Card Telephone Personal Total 

Questionnaire Interview 

Number 2984 i 

Sent or .2984 
in Group 

or Contacted 

Percentage 51% 22% 71% 43% 8'*7 

Response 

Although the response rate increased from 51% to 84% by application of the 
intensive foUowup techniques, the amount of information obtained from respondents 
to postcard questionnaires, telephone contacts, and personal interviews was markedly 
reduced, being confined to a few critical items, not connectable with other 
followup data available in the full questionnaire. *Tliis implies either a rather 
drastic application of the "missing data" options for analysis or a different set ' 
of weights to be applied when full data and partial data are to be analyzed. Such 
intensive followups can be useful in further characterizing non-respondents and 
possibly modifying weighting procedures. 
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The C08t8 for these intensive followup procedures are rather high, as 
Indicated by the following summary data: 

^' Questionnaire with Special Delivery, A„^n-^.p.. 

The postal card questionnaire, was sent to 1560 non-respondents to the 
full-length questionnaire, for a total cost (postage, pointing, auto-typing, 
etc.) of $2187.76. Three hundred and fourty-eight students returned the 
short-form questionnaires for a cost per response of $6.28. About 1/3-1/4 
of the cost was for the auto-typed lettar. 

2. Telephone Interview 

A New York City survey research firm was given the names and fall 1970 
home addresses of the 860 CUNY freshmen. Th^y contacted 659 and obtained 
data over the telephone from 608 of those students at a cost of $4.75 per 
respondent. 

3. Personal Interview 

The same survey research firm was given the names and fall 1970 home 
addresses of a random sample 100 "hard core" non-respondents. They obtained 
usable da.ta from 45 of these students at a cost per respondent of $40. The 
sum was paid for locating about 85. On an actually interviewed basis, cost 
per respondent was about $25. 
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Appendix B 



July xO, 1972 



To: Dr. John A. Creager 

From: Dr. William 6. Conroy, 
Principal . Investigator 




Technical Memorandum #2 



Subject: A Note on MISOE Sample 



As a follow up to our Washington discussion, I felt it would be 
useful to communicate my conception of the MISOE sample. 

Essentially, MISOE includes a 3-stage information collection process, 
with data connections across all three stages. 

Stage 1 . MISOE-CDS - For all 9U programs of Occupational Education 
a description of anticipated and real cost, enrollment and number of comple- 
ters. MISOE-CDS includes an analysis; oystem which "automatically" provides de-- 
tailed and timely summary reports of MISOE-CDS data for appropriate management 
levels of occupational education. MISOE-CDS input data is restricted to age, 
sex and race. 

Stage 2. MISOE-SDS(l) - For a representative sample from each of 
the 9if programs of occupational education in Massachusetts a detailed descrip- 
tion of input and impact. This allows for cost /impact analysis by program, 
controlling for input types. It also allows for considerable comparative 
analysis among and across occupational education programs. Stage 2 input and 
impact data must be connectable to Stage 1 cost data by program for analysis. 
(It should be noted that Stage 1 and Stage 2 data cons1:itutes entry level 
data for Stage 3 MIS0E-SDS(2) data described below. The concept of entry level 
analysis was init >ted in Occasional Paper #3 and miiiht be referenced at this 
time). 

S'^^g^ HIS0E-SDS(2) - For a representative sample of each of the 
occupational education programs with an enrollment of approximately 800 or 
more a detailed description of the product and process of these programs, in- 
cluding product-cost (by behavioral objective) information. General educa- 
tional development data will also be obtained for Stage 3 programs. To 
establish connectability between total MISOE and occupational education 
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practitioners during the initial development and implementation of HISOE, all 
LEAs offering programs classified in Stage 3 will be asked to report behavioral 
objectives to the State when they report anticipated enrollment and cost data. 

It is helpful to note that total MISOE will be operational for all Stage 
3 programs but not for programs excluded (because of limited enrollment) from 
Stage 3. For programs excluded from Stage 3, process-product data will not be 
available. Fundamentally, this means that the within occupational education 
manager will not be accommodated for non-Stage 3 occupational education 
programs . ^ 

Finally, for a representative sample of students enrolled in non- 
occupational educational programs at the secondary and post-secondary levels 
input, impact, general educational development and program cost data will be 
gathered. This allows for comparative analysis between occupational and non- 
occupational education described in our Occasional Papers. At the secondary 
level this includes students enrolled in general and academic programs, while 
ax the post-secondary level it includes students pursuing academic programs. 

MISOE samples will be drawn at the time of initial enrollment for 
each program and followed through to program completion and into impact space 
for all stages, i.e., HISOE is fundamentally a longitudinal data system. 
Cohorts are replaced upon program completion. During Fy'73 MIS0E-SDS(1)(2) 
will be identified and established for longitudinal study. At the same time a 
Stage 2 and 3 cross-sectional sample will be identified and formed for pro- 
grams included in HIS0E-SDS(2) and (3), and impact data will be gathered 
during Fy'73 on a 1, 3, 5 and 10 year basis, thus forming a basis for initial 
analysis across all MISOE subsections. Cross-sectional data will be appropri- 
ately identified in both analysis and inputs. Such a crosc-sectional con- 
sideration will allow the MISOE analysis system to be tested during FY'73, the 
last planning year, and provide a substantial amount of ujeful information. 

The following is a tentative list of occupational educational pro- 
f^!c7^ included in the Stage 3 information collection process or KISOE- 

SDSU). Please note that current information does not make clear the enrollment 
by program within the electrical, electronic and metalworking programs. The 
same is true for graphic arts and woodworking, but we consider these to be one 
program. I also believe electrical occupations describe one program, but this 

"JL"^-"^ for metalworking and electronics. Therefore, under electronics I am 
listing industrial electronics and oommunications and under metalworking I am 
listing separate programs entitled machine snop, sheet metal and welding. 

Also included is a distribution of enrollment by occupational educa- 
tion programs and level. ■> tr 
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SDS(2) OCCUPATIONAL EDUCATIONAL PROGRAMS 

1. AGRICULTURE OCCUPATIONS 

There are no agricultural programs for Stage 3 

2. DISTRIBUTION OCCUPATIONS 

a. Apparel and Accessories • • 

b. General Merchandise 

3. HEALTH OCCUPATIONS 

a* Nurse - Associate Degree 
b. Practical Nursing 

4. HOME ECONOMICS OCCUPATIONS 

a. Compa»~atIve Homemaking 
b* Care and Guidance of Children 
5e OFFICE OCCUPATIONS 

a. Accounting and Computing 

b. Business Data Processing Systems 

c. Filing and Office Machines and General Office Clerical 

d. Information Communications Occupations 

e. Stenography and Secretarial Occupations 

f . Typing Occupations 

6. TECHNICAL - There Is not enough enrollment indicated In any of the 

technical programs for Inclusion In Stage 3. 

7. TRADES AND INDUSTRY 

a. Automotive Services 

(1) Body and Fender 

(2) Mechanics 

b. Construction and Maintenance Trades 

( 1 ) Carpentry 

(2) Electricity ^ 
^3) Ptumbfng and PIpef Itting 
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c. Drafting OccupatJons (Assumption: I Program) 

d. Electrical Occupations (Assumption: I Program) 

e. Electronic Occupations (Assumption: 2 Programs) 

(1) (k^mmunlcatlons 

(2) Industrial Electronics (Enrol I ment may be concentrated only 
on this program). 

f. Graphic Arts Occupations (Assumption: I Program) 

g. Metalworking Occupations (Assumption: 3 Programs) 

(1 ) Machine Shop ' 1 

(2) Sheet Metal » j 

(3) Welding 

h. Cosmetology j 
i» Metallurgy (Assumption: I Program) 

j- PubUc Service Occupations 

(1 ) Firemen Training 

(2) Law Enforcement Training 1 
Quantity Foods Occupations (Assumption: 1 Program) 

I. Woodworking Occupations (Assumption: I Program) 

(I) MI 1 1 work and Cablnetmaking 1 
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ENROLLMENT DISTRIBUTION 





P 


Secondary 


Post Secondary 


Adult 


Total 






78% 




10^ 




Agriculture 


\$ 


882 


50 


51 


983 


Distribution Occ. 


3se 


3,398 


407 


120 


3,925 


Health 


3se 


425 


2,417 


157 


2,999 


Home Economics 


5se 


4,979 


1,037 


136 


6,152 


Office 


59% 


61,383 


5,874 


1,539 


68,796 


Technical 


2% 


345 


1,297 


331 


1,973 


Trades and Industry 


28% 


19,551 


3,709 


9,183 


32,443 




\oo% 


90,963 


14,791 


11,517 


1 17,271 
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